داستان آبیدیک

web crawler


فارسی

1 کامپیوتر و شبکه:: خزنده وب

Cafarella Predecessor Nutch Subsequent version YARN or Hadoop 2.0 Hadoop written language Java Philosophy of computation Divide and conquer for large datasets Principle of computational processing Bring computer to data rather than bring data to computer System A distributed programming framework Main characteristics Accessible, robust, scalable, simple, and fault tolerant Storage-Hadoop distributed file system (HDFS) Self-healing distributed and shared storage element Initial computational program - MapReduce Distributed, aggregated, and collaborated parallel processing MapReduce library written language C++ code Process type Batch Hardware type Heterogeneous commodity hardware Software license Open source Initial applications IR and searching index and web crawler Solution type Software solution, not hardware solution Scalability solution Scale-out, not scale-up Typical size of data set From a few GBs to a few TBs Capable size of data set From tens of TBs to a few PBs Simple coherency model Write once and read many Default replication factor 3 Typical size of data block for HDFS 64 MB Permission model Relaxing POSIXa model Main application modules Mahout, Hive, Pig, HBase, Sqoop, Flume, Chukwa, Pentaho ... The main object of having GFS is to implement Google's data-intensive applications; initially, it was designed to handle the issues of web crawler and a file indexing system under the pressure of accelerating data growing. To this extent, the history of Hadoop is an evolutionary progress to generalize data processing task from a particular workload (eg, web crawler) to all types of ML workloads (see Fig. 14). • Create a Lucene index (web crawler) Cafarella indicated that the text searching was the centerpiece of any search engine or web crawler, which was included in Nutch.

واژگان شبکه مترجمین ایران


معنی‌های پیشنهادی کاربران

نام و نام خانوادگی
شماره تلفن همراه
متن معنی یا پیشنهاد شما
Captcha Code